ac 2
- North America > Canada (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- North America > Canada (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- North America > Canada (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition
Ye, Guanghao, Pham, Khiem Duc, Zhang, Xinzhi, Gopi, Sivakanth, Peng, Baolin, Li, Beibin, Kulkarni, Janardhan, Inan, Huseyin A.
Recent AI advancements, such as OpenAI's new models, are transforming LLMs into LRMs (Large Reasoning Models) that perform reasoning during inference, taking extra time and compute for higher-quality outputs. We aim to uncover the algorithmic framework for training LRMs. Methods like self-consistency, PRM, and AlphaZero suggest reasoning as guided search. We ask: what is the simplest, most scalable way to enable search in LLMs? We propose a post-training framework called Reinforcement Learning via Self-Play (RLSP). RLSP involves three steps: (1) supervised fine-tuning with human or synthetic demonstrations of the reasoning process, (2) using an exploration reward signal to encourage diverse and efficient reasoning behaviors, and (3) RL training with an outcome verifier to ensure correctness while preventing reward hacking. Our key innovation is to decouple exploration and correctness signals during PPO training, carefully balancing them to improve performance and efficiency. Empirical studies in the math domain show that RLSP improves reasoning. On the Llama-3.1-8B-Instruct model, RLSP can boost performance by 23% in MATH-500 test set; On AIME 2024 math problems, Qwen2.5-32B-Instruct improved by 10% due to RLSP. However, a more important finding of this work is that the models trained using RLSP, even with the simplest exploration reward that encourages the model to take more intermediate steps, showed several emergent behaviors such as backtracking, exploration of ideas, and verification. These findings demonstrate that RLSP framework might be enough to enable emergence of complex reasoning abilities in LLMs when scaled. Lastly, we propose a theory as to why RLSP search strategy is more suitable for LLMs inspired by a remarkable result that says CoT provably increases computational power of LLMs, which grows as the number of steps in CoT \cite{li2024chain,merrill2023expresssive}.
- Workflow (1.00)
- Research Report > New Finding (1.00)
Partial Proof of a Conjecture with Implications for Spectral Majorization
In this paper we report new developments based on the proven cases of a surprising conjecture relating to special properties of n n positive definite (PD) matrices for n 6. It is argued in [8] that traditional mathematics has focused primarily on results that hold generally for all n, whereas most theoretical physics models, and most applied mathematics and engineering problems, are intrinsically defined in a fixed (and small) number of dimensions, e.g., time and the three spatial dimensions of ordinary experience. What is not commonly recognized is that as soon the dimensionality of a problem becomes fixed, e.g., to 3 dimensions, opportunities exist to establish potentially useful properties of the system of interest that do not hold generally in higher dimensions. Unfortunately, proving such properties typically requires computer-assisted methods that are not familiar to most scientists, mathematicians, and engineers. In Section 2, we illustrate these statements by showing how the IRGA conjecture for n 3 is relatively straightforward to establish by hand, while proof of the n = 4 case was accomplished using powerful computer-assisted proof methods. We discuss why such methods are likely required, and why proofs for the remaining n = 5 and n = 6 cases are likely beyond the capabilities of current state-of-the-art methods on even the most powerful supercomputers. In Section 3, we describe how proven cases of the IRGA conjecture define a fixed-dimensional family of matrices for which the diagonal majorizes the spectrum [9]. In Section 4, we present new results showing that Kronecker products of these matrices retain this unique majorization property. Then, in Section 5, we conclude with considerations on the imminent arrival of AI-based theorem provers, which can be viewed as proof oracles rather than computer-assisted tools.
- North America > United States > Missouri > Boone County > Columbia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
Learning from i.i.d. data under model miss-specification
This paper introduces a new approach to learning from i.i.d. data under model miss-specification. This approach casts the problem of learning as minimizing the expected code-length of a Bayesian mixture code. To solve this problem, we build on PAC-Bayes bounds, information theory and a new family of second-order Jensen bounds. The key insight of this paper is that the use of the standard (first-order) Jensen bounds in learning is suboptimal when our model class is miss-specified (i.e. it does not contain the data generating distribution). As a consequence of this insight, this work provides strong theoretical arguments explaining why the Bayesian posterior is not optimal for making predictions that generalize under model miss-specification because the Bayesian posterior is directly related to the use of first-order Jensen bounds. We then argue for the use of second-order Jensen bounds, which leads to new families of learning algorithms. In this work, we introduce novel variational and ensemble learning methods based on the minimization of a novel family of second-order PAC-Bayes bounds over the expected code-length of a Bayesian mixture code. Using this new framework, we also provide novel hypotheses of why parameters in a flat minimum generalize better than parameters in a sharp minimum.
- Europe > Spain (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (2 more...)
Approximate Inference by Compilation to Arithmetic Circuits
Arithmetic circuits (ACs) exploit context-specific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we find that the methods based on sampling and learning work best: one such method (AC2-F) is faster and usually more accurate than loopy belief propagation, mean field, and Gibbs sampling; another (AC2-G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Oregon > Lane County > Eugene (0.14)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (9 more...)